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Abstract 


In this paper, we theoretically propose a new hashing scheme to establish the sparse Fourier 
transform in high-dimensional space. The estimation of the algorithm complexity shows that this 
sparse Fourier transform can overcome the curse of dimensionality. To the best of our knowledge, 
this is the first polynomial-time algorithm to recover the high-dimensional continuous frequencies. 
Keywords: Curse of dimensionality, Frequency estimation, Runtime complexity, Sparse Fourier 
transform. 


1 Introduction 


Discrete Fourier transform (DFT) plays a fundamental role in signal processing, scientific computing 
and other fields. Its runtime complexity can reach Nlog(N) (where N represents the signal size) 
by the fast Fourier transform algorithm. This runtime complexity is still too high for the ultrawide 
bandwidth signals or high-dimensional signals. If we increase the prior information of the signals, that 
is, assuming the sparsity of the frequencies, then the sparse discrete Fourier transform (SFT) [1] will 
have more advantages than the DFT. The sample complexity and runtime complexity of the SFT are 
mainly affected by the sparsity, and less affected by the bandwidth. The SFT originated from the 
work on the Hadamard transform [2] and has received continuous attention from applied mathematics 
[3, 4, 5, 6, 7, 8], signal processing [9, 10, 11, 12, 13, 14], and theoretical computer science communities 
(15, 16, 17, 18, 19, 20] over the last two decades. Most of the relevant works deals with the discrete 
case where the frequencies are on the grid. Under such condition, the SFT can overcome the curse 
of dimensionality [20, 21, 22, 23]. However, this condition that the frequencies are on the grid is so 
strong. It is natural for people to consider the case that the frequencies are in a continuous region. 
This has led researchers to establish the sparse Fourier transform in the one-dimensional continuous 
setting [18, 24, 25]. 

Recently, [26] initiates the study on the SFT in the high-dimensional continuous setting. Unfortu- 
nately, their SFT method is still subject to the curse of dimensionality, namely, its runtime complexity 
is greater than 204 (d stands for the dimension). The main difficulty in estimating high-dimensional 
continuous frequencies is that when we fix a set of discrete orthogonal bases whose frequencies are 
located on the grid, there is always a single-frequency signal with the frequency located in a continuous 
region, which is not sparse under the discrete Fourier transform (see “the windowing effect” in [11]), 
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and as the dimension increases, the Lı norm of the frequency coefficients increases exponentially. On 
the other hand, in a physical sense, the single-frequency signals should obviously be sparse. 

In this paper, we present a new hashing scheme to transform the high-dimensional SFT into the 
one-dimensional SFT. The computational complexity of this algorithm is polynomial, which means 
that the algorithm can break the curse of dimensionality. To the best of our knowledge, this is the 
first polynomial-time algorithm to recover the off-grid high-dimensional frequencies. 

Formally, we consider the signal f of the following form 


k 


k 
F(t) = D7 FO) S Da; exp(2miw; - t), (1.1) 
j=l 


j=l 


where t € R?, w; € [-M, M]*, aj € C, 0 < A’ < |a;| < A for all j = 1,2,...,& and mini <j<cj<k |wi — 
w;| >n. 
Our goal is to recover w;,a;,j =1,2,...,k. 


2 Main Result 


We first give some symbols and notions. Let sgn (x) =1,when 2 > 0; otherwise, sgn (x) = —1. Let 
J* denote the transpose of the matrix J. Denote for each t € R by |t] the largest integer not bigger 
than t. Let us say that U C E is the esupport set of the function f if Jew |f|? < e, where F is the 


domain of f. Let 
A Ly wa —TF TF i 
= Te npe € SIS 
ry (5. 3) m zn| i ete 


aJ(& ee -TF TF , 
rtf e aa 


The discrete Fourier transform of the function g takes the following form 


and 


Flo\(é) = aa J» g(x) exp(—2miz-), £ €T», 
rely 


and the inverse discrete Fourier transform of the function g is defined by 


L > gE eplir- £), zen. 


OERA 
OY UT 


For z,y € T1, £ +y Ê xz +y(modT) € T4. For x,y € T2, z +y Ê z + y(modF) € Po. 
The “bucket” in the frequency domain is defined by 


TF(j—1) FT TFj FT . 
F , Lig ss 
s 2 s 2 


B; £ {(G/T.....f4/7) ETa: gae | (2.2) 


The hashing transform (in the frequency domain) is defined by H (£) = hé — (0,...,0,6/T)*, where 


o [lai 0 
ka (2.3) 


here Ig_; is the identity matrix of order d — 1, Vp = (hi,...,ha_-1) and {hy,..., hq} are independent 


draws from the uniform distribution on the set {2n +1 : n € Z}N[—VdF/(n), VdF/(n)]. The random 


variable b obeys the uniform distribution on the set Z N {[- 4E, *)}. 


In this paper, we assume that T, F, s are powers of 2 and 1 < s < F. Next, we give a key lemma. 


Lemma 2.1 Suppose 0 <6 < 4, 0 < € < min{1,n, 4, rae r= OF), T = O(k>sd"/? /(€2n63)), 
F = sM. With probability at least 1 — 6/4 over the randomness of {hi,...,ha,b}, for each j € 
{1,...,s}, either 
A 2rib(hg) z4 F- in 2 22 
ha ® pg le Fe, Flt (Yo)? < Resa) eA 
xer 


or there exists a unique jp € {1,2,..., k} such that 


2rib(hq) tx 


© Ij; ê TF = YS fal -e Tr“ FXp, - Fl ful|((h-+)*2)|? < A2e76/(ds). (2.5) 
Dn zer 
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= and 


By Markov’s inequality, we have a corollary. 


Corollary 2.2 All parameters are set as Lemma 2.1. For any l € {1,2,...,d}, choosing the ran- 


3 dom vector Xı © (£1,...,Z1—1, Zl+1;---; Zd) which obeys the uniform distribution on the set Z%-i n 
; 5 =f Aye, | then with probability at lenge = 6/(4ds) over the randomness of (%1,..-,%1-1, Tl+1; -< -, Za); 


— we gic 
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Where xz = (a) Fs ay Pay) Fa) Pcs ay FY". 


We explain Lemma 2.1 at a high level. It is well known that for the discrete Fourier transform, 
enlarge the sampling interval can enhance the frequency concentration (the frequency resolution) of 
each component fj, and the frequency gap between different components is greater than a constant 
value 7. Therefore, with the expanding of the sampling interval, 7 will be greater than the radius be 
of the e-support set (let aa = = €/k? in Lemma 3.1) for each signal component in the frequency 
domain. And the hashing transform we defined can transform the gap between frequencies to the 
gap between the last coordinates of the frequencies (see Lemma 3.2). When 7 > O(dk?./67) , this 
hashing transform can keep the main frequencies ¢;,3. (defined in Lemma 3.1) of the same component 
into the same bucket, and different components’ e-support sets ¢;,3., jB, t # j can be isolated into 
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Figure 2.1: The left side of the figure shows the spectrogram of the two-dimensional signal exp(i27w - 
t) + exp(i27w2-t) under the discrete Fourier transform, where e is the radius of the e-support set 
$1,8. 2,8. in the frequency domain. The right side of the figure is the spectrogram after hashing 
transform, by Lemma 3.2, we have |(h(w1 — w2))2| > 2F/s. By increasing the sampling interval T, we 
can make le E) < O(F/s). 


different buckets (see Corollary 3.3). The figure 2.1 shows how the frequencies of the two-dimensional 
signal is transformed under the hashing operator (2.3). 

When the different components of the signal are isolated, we only need to recover the frequency and 
amplitude in each bucket. By Lemma 2.1, there are only two cases of such amplitude and frequency, 
one is that they are close to the amplitude and frequency of a certain component (up to the hashing 
transformation), and the other is that the amplitude is less than O (e). 

Specifically, for 1 < j < s, 1 < l < d, set 


gj (t) = exp (zroh) Yn (F), ` F| fal] (h zut), 


where pı( yr) = HEL, when | = d; otherwise, pi (HE) = 74 and 

Lt = (x1, «e+, Tl—1; [tF | /F, Tl41,++- tig). 
Here (#1,...,2j-1,%141,---, qa) is the random vector obeys the uniform distribution on the set Z4-!'N 
[5 = IE )¢-!. The sample value of the function gj, is calculated by the method in Lemma 4.1. 


T F > O(VdM/e), by Lemma 2.1 and Corollary 2.2, either there is a signal component fj, with 
frequency w;, such that 


1 . 
al lgia(t) — 5,05, ure" Pat < ORE), 
T —T/2<t<T/2 


or 
1 
2 f lgjt(t)[2dt < O(A22), 
T J-T/2<t<T/2 


where w;, denotes l-th component of the vector wj, and 
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We have the following algorithm: For each j € {1,2,...,s}, we first use the algorithm in [24] to 
recover the amplitude and frequency of gjı(t). If the absolute value of the output amplitude less 
than A’ /2, we turn to recover the amplitude and frequency of g;+1,1(t); otherwise, we keep WF 4 
and continue to use the algorithm in [24] to recover the frequencies of g;9(t),...,9;,a(t). Finally, 
{w MNP wep nE WF qs can be recovered element-wisely. 

Note: From Theorem 1.1 in [24], it takes at most O(dslog(TF’) log(ds/(de))/d) samples of the 
function gj; to recover the frequency w;,, (up to the accuracy A*e/(T'A’)) with probability at least 
1 — ô/ (4ds). By Lemma 4.1, in order to get O(dslog(T F) log(ds/(de))) samples (up to the accuracy 
Ae) of gj. with probability at least 1 — 6, we need a total of 


O(kdsln?(TF + 1) log(T F) log(T Fds/(de)) ln?(1/5)/(de?)) 
samples and running time. Therefore, we have the following result. 


Theorem 2.3 All parameters are set as Lemma 2.1, besides, F > O(VdM/e). The above algorithm 
can output {w?,..., we} such that 


Ava, 2 comes 
A'T Vdk4 
holds with probability at least 1—6 over the randomness of {hi,..., ha, b}, {X1}; (see Corollary 2.2), 


T (see Lemma 4.1) and the algorithm in [24]. The runtime complexity and sample complexity are at 
most 


Jw? — w;| < Of 


i ), i=1,2,...,k. 


O(k?d?s” In? (TF + 1) log(TF) log(T Fds/(ée)) In?(1/6)/(€76)). 


The success probability can be boosted to 1 by repeatedly restarting (as indicated in ([15, 24])). 
2-3/ A! 

ae ) and e < ņ. As the 

sampling interval increases, the different signal components f; are nearly orthogonal, that is 

(E/K)? freroajeqa FE at 

aj(/k)* Sieppo et dt + Diale hrejoajere ne at 


Next, we consider restoring the amplitudes. Notice that |w? — w;| < O( 


= aj(1+ OB") + Dh (P/F) heppje memoria ae 
< aj(1+ OA) + (k — 1)AO(C2/nk). 
Then 
ay = (2/k)* f f(t)e 2S dt + O(A?e/ (4). (2.8) 
tE[0,k/e2]4 


We can use the Monte Carlo method to compute the integral 


(2/k)4 f fOe tat, 
te[0,k/e?]4 


Choosing the random samples { f (t;)e 777" }:, obeying the uniform distribution on [0, k/e?]¢, where 
N; = O(log(1/5)A’"/(Ae)?). By Hoeffding’s inequality, with probability at least 1 — 6, 


1 5. O e oO 
— S (tieit — (€?/k)4 fier dt] < aed (2.9) 
Ne = te [0,k/e2]4 


Therefore, the amplitude a; can be recovered (up to O(A*«/A’)). 


3 Proof of Lemma 2.1 


To prove Lemma 2.1, we need some technical lemmas. 


Lemma 3.1 Let 0 < 8 < F/2, for each fj, the following inequalities 


3/2 42 
T D eO FLNO - FOP SOG), (3.10) 
EcTo 


holds for any j € {1,2,...,k}. Where 


Mo a a ee! eee 


0 otherwise 


Proof: For any a € (—1/T,1/T) and any n € ZN ([2, FT/2) U [-FT/2, —2]), we have 


rel D Cries eo(( -1 ) (i) )| < Te esea a p <OW? 


jezo[=FE, TE) 


(3.11) 
Since |a;| < A, by (3.11), using Parseval’s identity, we have 
(2/2 A2 
ai D Eal) FIO -FENO < OF Te (3.12) 
EcT2 
which completes the proof. 
Lemma 3.2 Lets = O(k?/5), F =sM. For any vector w € [—2M,2M]¢ with |w | > n, the inequality 


\(h(w'))al > 2F/s holds with probability at least 1 — 5/ (8k?) over the randomness of hy, ho,...,ha, 
where (h(w ))q denotes d-th component of the vector h(w'), w, denotes i-th component of the vector 
w, {hy,...,ha} are independent draw from the uniform diate bien on the set {2n+1:n E€ Z}A 


[-VaF/(n), VdF/(n)]. 


Proof: Without loss of generality, let us assume that the d-th component wy of w is greater than n/ Vad. 
Fix arbitrary hi,...,hq_1, since n/Vd < |w,| < 2M, with probability at most O(1/s) (no greater than 
the ratio of the width of the set [—2F'/s, 2F'/s] to O(F), see the figure 3.2) over the randomness of hg, 
we have 5i hiw; € [-2F/s,2F/s]), which complete the proof. 
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Figure 3.2: Let us assume wy > n/Vd. Fix arbitrary hy,...,hg_1. Without loss of generality, 
suppose (h1,...,ha-1,1)* -w E€ [—2F/s,2F/s]. The red region represents the range of ha such that 
es hiw; € [-2F/s,2F/s]. As hg traverses the set {2n +1: n € Z}N[-VdF/(n), VdF/(n)], the 
ratio of the area of the red region to the total area (the sum of the red region and the blue region) is 
O(1/s) , which is approximately equal to the success probability for Fa hiw; € [-2F/s,2F/s]. 


Notice that the number of vectors {w; — wj: 1 <i < j < k} is k(k — 1)/2. From Lemma 3.2, the 
probability of the union of the collision events (namely, (h(w; — w;))a € (—2F/s,2F/s)) is at most 


where H denotes the number of elements in the set {2n +1 : n € Z}4n [-VdF/(n), VdF/(n)|?. 
Therefore, we have a following consequence. 


Corollary 3.3 All parameters are set as above lemma. Suppose the vectors w,,...,wg € [-M, M]? 
satisfy 

min |w; — w;| >n, 

ij 


then 


j ,— W; OF 
oo w;))a| > 2F/s 


holds with probability at least 1 — 6/8 over the randomness of hy, h2,..., ha. 


Suppose dF 8/ņn < F6/(8ks). For any &1, € € Qjo, we have |(h(,) — h(&2))a| < OF /(8ks) < F/s. 
However, this does not mean that there is a bucket B;, such that h(¢;,3) C Bjs- It is possible that 
h(ġ; g) intersects on the boundary of some bucket. Obviously, after adding random translation, with 
a certain probability (the ratio of the total “width” of the sets jg, j = 1,2,...,k to the “width” of 
the bucket), there exists a bucket B;, such that H(¢; g) is completely inside it, that is, the following 
conclusion holds. 


Lemma 3.4 Suppose dF'B/n < F6/(8ks), with probability at least 1 — 6/8 over the randomness of b, 
there exists a bucket Bj, such that H (j8) C Bj, for j=1,2,...,k. 


Now we give the proof of Lemma 2.1. Choosing s = O((k?/5)), 8 = 7d/(8dks), F = sM and 
T= O(këd5/? /((52)?nd?)). By the inequality (3.12), we have 

1 

T X 1%6;,6(6) FINE) — FME? < O(47e76/(dsk?)). (3.13) 


EET2 


From Corollary 3.3, Lemma 3.4 we know that for each bucket B;, there is at most one set $j, 8 
such that H(¢;,,3) C Bj. Using the inequality (3.13), since X4, ,(H~'(&)) = Xi($, s) (£), then with 
probability at least 1 — 6/4 over the randomness of {h1,...,ha,b}, for each bucket B;, either 


A X |, (6) - FIETHE)) — F (f) (H78)? < O(A?e?5/(ds)), (3.14) 
€€T2 
al XO |B, (€) - FLA E)? < O(A7e?6/(ds)). (3.15) 
EcT2 


Since F| fy] (£) = F[f](H—!), we complete the proof by Parseval’s identity. 


4 Proof of Lemma 4.1 


The following lemma reduces the computational complexity of the convolution operation. The classical 
method to reduce the computational complexity (from Poly(TF) to Poly(Log(TF)) is to select an 
appropriate window, such as the Sinc x Gaussian window [17], which can approximate the indicative 
function in the frequency domain and rapidly decay in the time domain. In this paper, we do not add 
windows for brevity, instead, we choose the appropriate sampling method. 


Lemma 4.1 For any 0 < e< 1/2, 1< j < s, suppose 


N= o(<* In? (TF) xaa), 


e2 


then with probability at least 1 — 6/4 over the randomness T, the following inequality holds 


N sgn (t; TE 2\ti] _ 3 
sup Fa, Fiale) ~ p Xda (e (0-0, A DY Yoe 


where T = {t1,... ty} are independent draws from the uniform distribution on (—1/2,1/2), 


wo =m (7 41) (E 42)" (o ( (E-N). 


and 


2riyj )) 


exp(—riy)(exp( Zrii- )—exp( 3 
v2 j(y) = i TF(1—exp(2riy/TF)) Ure (4.17) 
1/s y= 


Proof: Observe that 


2rinw 
> exp ( ) =0 for néeNTUN, 
weZn[-TF/2,TF/2) TF 
then 
— x-y) F(X; 
F-'[Xp, : Fl fal|(2) = Byen fal as iY) 
Denn =e, te) fale — (0,..-,0, #)*)v2,9(2) 
TF he P 
= fo P fu(a—(0,...,0, BH*)(2 + 1)20(1 + TF/2)v25(2)d (ey) ae) 
+ [onpy fale — (0,-.-50, GNC + 121m + TF/2)00,5(2)d sea) 


T. 


sgn TE }1)2ltl—1 x 
JEP, fale- (0,...,0, HOG Dy yu (t)dt, 


where supye—1/2,1/2) W) < OUn(TF)). 

When we fix x, the above integral can be calculated by Monte Carlo method, and the error can 
be estimated by Hoeffding’s inequality. However, when we want to obtain the uniform error for all x 
in T1, we need to use the Rademacher complexity [27, 28]. We consider the Rademacher complexity 
of the following function spaces 


Quw,u,Co = {qx (t) = u(t) sin(2rw( 


x — [sen (t)((4F +1) — 1)] } 
F 3 
and (TE 4 12 4 
z- [sgn (t)(( +1) )] 
( F J}, 
where w, x,t € R and u(t) is any real-valued function satisfying |u| < Co. Let Re(T; Qw,u,Co) denote 
the empirical Rademacher complexity for Qw,u,Co and 7, then 


Qu,u,Co > {q,(t) 2 u(t) cos(2rw 


Re(T: Quu,Co) = Eiei [supser TN, elt) 

S EenqaiN ESE Di 5i Cen zeta) < a, (=) 
where 
Yi = os (CE + 174 — 1)| u(ti), Yi2= ao l a E 1)! — 1)| alts); 


F F 
the last inequality in (4.19) is obtained by Lemma 26.10 in [27]. Similarly, we have Re(T; Goad < 
2C 
VN" 

Combining the last equality in (4.18) and Lemma A.10 in [28] proves Lemma 4.1. 


5 Conclusion 


The hashing transform we construct is also applicable to the case of lattice (as considered in [20, 21, 
22, 23]), which we have not covered in this paper. We do not consider the case of noise, so it is an issue 


10 


that need to be discussed later. In addition, whether the high-dimensional signal without frequency 
gap can be effectively reconstructed (just like one-dimensional case [25])? In terms of application, 
it is worthy to optimize the algorithm and the complexity estimation to make it more applicable to 
practice. 
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